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Abstract Body 


Background / Context: 

Replication studies allow us to make comparisons and generalizations regarding the 
effectiveness of an intervention across different populations, versions of a treatment, settings and 
contexts, and outcomes. One method for making these comparisons across many replication 
studies is through the use of meta- analysis. Meta- analysis methods allow us to answer questions 
like: On average, how effective are interventions of this type? How much does the effectiveness 
vary across studies? And, often most importantly, does the effectiveness vary in relation to 
features of the underlying populations, treatments, settings, or outcomes? 

In many experiments, the effectiveness of a treatment is measured using multiple 
outcomes. For example, in reading intervention studies, measures of fluency, word recognition, 
and comprehension might be collected. In some studies, questions of durability of a treatment 
effect are important; in order to assess this, measures might be collected both at the end of an 
intervention and three- or six-months later. Traditional meta-analytic methods, however, have 
required effect sizes to be independent, making it difficult for inferences and comparisons to be 
made across different outcomes. In order to ensure independence, the common solution is to 
either select only one outcome from, or to create a single combined measure for, each study for 
inclusion in the meta-analysis. This often results in a loss of information. 

A recent innovation in meta-analysis is the introduction of a robust variance estimator 
that allows for the inclusion of multiple, correlated effect sizes in a meta-analysis (Hedges, 
Tipton, and Johnson, 2010). This method does not require any information on the true correlation 
structure of these estimates, which is particularly important since this information is rarely 
available in primary studies. The statistical theory behind the robust variance estimation (RVE) 
method is asymptotic; in large-samples, it has been shown to be an unbiased estimator of the true 
sampling variance. The RVE approach is already widely used in meta- analyses in psychology, 
social welfare, and education. 

Importantly, the RVE estimator is a type of linearization or Taylor-series estimator, 
which are commonly used in the analysis of panel data in econometrics, in survey sampling (with 
complex sampling designs), with generalized estimating equations, and are particularly useful 
when a standard regression model is preferred and the random effects are not of direct interest. 

In particular, the RVE approach is most similar to that of clustered standard errors (Eiang & 
Zeger, 1986) which are used to account for the clustering or nesting of data (e.g., students in 
schools); clustered standard errors are an extension to Huber-White standard errors (Huber 
1967; White, 1980), which are used for accounting for heteroskedastic errors in independent 
data. 

Purpose / Objective / Research Question / Focus of Study: 

While the RVE estimator is unbiased in large-samples, its small sample properties are 
often not ideal. Previous simulation studies have shown that over many different effect sizes, 
when the number of studies is less than 40, the associated confidence intervals often under-cover 
and the associated hypothesis tests have Type I error rates far above nominal (Hedges, et al, 
2010; Tipton, 2013; Williams, 2012). While these studies have varied the effect size and number 
of primary studies, however, other conditions - including the number and types of covariates 
used in the meta-regression models - have not been studied. To date, the main conclusion from 
these studies is that RVE results should not be trusted with meta-regression models with fewer 
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than 40 studies. This is a real limitation for the method given that at 50% of meta-analyses in 
edueation eontained fewer than 40 studies (Ahn, Ames, & Myers, 2012). 

Significance / Novelty of study: 

This paper investigates possible approaehes to adjusting the RVE estimator when the 
number of studies is small (less than 40), whieh is eommon in the both meta-analyses and 
replieation studies in edueation. These adjustments are based on work by Bell and MeCaffrey 
(2002) and MeCaffrey, Bell, and Botts (2001), whieh themselves are extensions to adjustments 
found in MacKinnon and White (1985). These include three methods for adjusting the residuals 
used in RVE and two methods for adjusting the degrees of freedom used for making inferences. 
In order to evaluate how well these methods perform in practice, we present results of two 
simulation studies: in the first study, we focus on several meta-regression models with a single 
covariate, while the second study focuses on a larger meta-regression model that mirrors the type 
of models found in practice. 


Statistical, Measurement, or Econometric Model: 

The RVE approach can be used whenever researchers seek to combine information across 
studies and at least one of these studies includes multiple outcomes. The fundamental problem 
with combining these effect sizes is that they are not independent. There are two types of 
correlation structures addressed by this method: “correlated effects”, which arise from multiple 
measures on the same units, and “hierarchical effects”, which occur because independent 
experiments conducted in the same laboratory often share many features (e.g., protocols, study 
populations). 

RVE can be used to estimate both an average effect size across all studies (and outcomes) 
and for estimating meta-regression models. These models allow for comparisons to be made 
across features of the population, versions of the treatment, types of outcomes, and features of 
the study context or setting. 

Let study j = \ ...m have a vector of kj effect size estimates Tj, a design matrix Xy, and a 
weight matrix Wy. Here Xy arises from the design of the meta-analysis and may include an 
intercept as well as covariates that vary across studies or effect sizes. Assume each study j=\...m 
also has an associated vector of kj residuals £j. We can relate these via the regression 

T = xp + e 

where T = (T’i,...,T%)’ is a vector of m vectors, each with kj effect size estimates, 

X =(X’i,. . .,X’m)’ is a design matrix of m stacked matrices, each of dimension kj x p, and P is ayu 
X 1 vector of coefficients to be estimated. Einally, let 8 = (8’i,...,e’m)’ be the vector of stacked 
error vectors, each of dimension ky x 1 . 

The regression coefficients P can be estimated using weighted least squares as 

f m A V m A 


yx'W,X, IX'W^T, 




The robust variance estimation method proposes to estimate the E(b) using 
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where e, = T, - X,b is the kj x 1 residual veetor in the 7 * study and, in the standard RVE 
estimator, Aj = Ij. Based on this, hypotheses of the form = 0 ean be tested using the Wald 
statistie 



In order to test if y^k = 0, this test rejects the null hypothesis if |t/| > tm-p.a, where tm-p,a is the 
level-a t-value with dfm}= m-p degrees of freedom. 

In this paper, we investigate two small sample corrections, one to the residuals and 
another to the degrees of freedom. The corrections to the residuals come through the Aj 
adjustment matrix shown in Eqn. (1) above. We investigate three such adjustments: 

2 ) A/”^ = [ml{m - l)]'^^(Ij - Hjj)-', and 

3) Aj (I Hjj) 

where Hjj = XjQXj’Wj and Q=(X’WX)''. Second, we investigate adjustments to the degrees of 
freedom. The first correction, proposed by Hedges et al (2010), is to use the t-distribution with 
dfHTj= m-p, where m is the number of studies and p is the number of predictors in the meta- 
regression model. The second correction, proposed by McCaffrey, Bell, and Botts (2001) is to 
estimate the degrees of freedom using the Satterthwaite (1946) approximation. This results in 

i/5,= (S2jk)'/sV. 

where 2jk are the EAj eigenvalues of X (Sgjkgjk’)X , for the covariance matrix E=E(ee’), which 
is a block diagonal matrix composed of the m Xj matrices. In the full paper, we provide specific 
estimation strategies for both the residual adjustments (Aj) and the degrees of freedom 
adjustments (t//sk) particular to the weighting strategies and correlation problems found in RVE 
in meta-analysis (i.e., correlated effects, hierarchical effects). 

The three residual corrections and the two degrees of freedom adjustments lead to a 
combination of 6 possible small sample corrections to RVE. In order to investigate how well 
these perform in small samples, we conducted two simulation studies. In the first simulation 
study we focus on simple meta-regression models, each with only one covariate. We focus here 
on the role of variable type on degrees of freedom. Type I error rates, and statistical power. We 
are interested in the role of variable type since previous research suggests that statistical 
properties of the Fkk* estimators depend on both the degree to which the covariates are balanced 
and on the leverage of the observations (Bell & McCaffrey, 2002; Chesher & Austin, 1991; Eong 
& Ervin, 2000; MacKinnon, 2013). In the second simulation study, we attempt to mirror practice 
more closely by comparing properties of the six corrections for meta-regression models with 4 
covariates. Here we focus on m=20 studies and include all four covariate types found in Study 1 . 

Both simulation studies focus on the test found in (2) above. Eor the a-level a=0.05, in 
both Study 1 and Study 2 we investigate how Type I error rates vary in relation to the number of 
studies (m), and the types of predictors; in addition, in Study 1, we investigate the statistical 
power of the test for three true regression coefficient relationships (small, medium, large). 

Findings / Results: 

The results of our simulation studies include 4 main findings: 

1) The most important result is that the estimator proposed by Hedges et al (2010) only 
performs well in very limited circumstances; when covariates are unbalanced or have high 
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leverage, and particularly when the number of studies is small, the Type I error rates can be 
tremendously larger than the stated a = 0.05. (See Figures 1 and 2). 

2) The second major finding is that the largest improvements to RVE arise through the 
use of Satterthwaite degrees of freedom, and that no estimator performs well when these degrees 
of freedom are smaller than 4. 

3) Third, our simulations suggest that two estimators perform well in a wide variety of 
situations; the bias reduced linearization estimator for weighted least squares proposed by 
McCaffrey, Bell, and Botts (MBBS; 2001) and the jackknife estimator (JKS). (See Figures 1 and 
2). 

4) The jackknife estimator (JKS) is typically more conservative than the MBBS 
estimator, and, as a result, it is also less powerful. Over the parameters included in our study of 
power, the MBBS is uniformly more powerful than the jackknife. (See Figure 3). 

Usefulness / Applicability of Method: 

In order to illustrate the usefulness of the method, we include an example based on a 
meta-analysis by Tanner-Smith and Lipsey (2013). This meta-analysis combined results of 
randomized-experiments evaluating the effectiveness of brief alcohol interventions (< 5 hours of 
contact time, < 4 weeks in duration) among adolescents and young adults. In these analyses, the 
outcomes include measures of the first alcohol consumption after the experiment ended. In the 
example, we focus on a subset of m = 25 studies (containing 300 effect sizes). Given the findings 
of the simulation studies, we compare results based on the original estimator given by Hedges et 
al (2010), and the MBBS and JKS estimators developed in this paper. 

We focus on a meta-regression model with 4 covariates. These include a variety of 
variable types, similar to the types under study in our simulation studies. We estimated this 
model in the statistical program R (R Development Core Team, 2012) using a correlated effects 
RVE model with an assumed p = 0.80. The results of the meta-regression are presented in Table 
1 in Appendix B. This table illustrates two points. Eirst, that the (recommended) Satterthwaite 
degrees of freedom vary highly from covariate to covariate, and for some covariates (even with 
m = 28 studies) can be quite small. Second, the p-values differ between the three tests, with the 
most liberal results coming from the unadjusted results (Hedges et al) and the most conservative 
from the jackknife (JKS); the MBBS results are in between. 

Conclusions: 

As our example illustrates, using the MBB or JK estimators with Satterthwaite degrees of 
freedom can impact the conclusions drawn from a robust meta-regression. Most commonly these 
differences arise because of degrees of freedom differences. These degrees of freedom 
differences can be large and are directly related to the degree of balance and maximum leverage 
in the data. Importantly, they often lead to different conclusions. 

Importantly, since the Satterthwaite degrees of freedom of these small sample 
adjustments depends not just on the number of studies, but also on the type of variable 
(dichotomous, continuous), the level of the covariate (study, effect size), the degree of balance 
across studies, and the presence of high leverage values, our simulation studies suggest that it is 
difficult to know at what point small sample corrections are no longer needed. Even with m = 40 
studies, the probability of a Type I error for the standard RVE estimator can be much larger than 
a = 0.05. Eor this reason we argue that it is best if the corrections provided here are implemented 
in all RVE analyses, even those of moderate to large sizes. 
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Appendix B, Tables and Figures 


Table 1: Example analysis using RVE with HTJ, MBBS, and JKS small sample 
corrections 


Coefficient 

B 

SE(B) 

HTJ 

SE(B) 

MBB/HTJ 

SE(B) 

JK/HTJ 

df 

MBBS 

df 

JKS 

Prob( |t|>) 
HTJ 

Prob( |t|>) 
MBBS 

Prob( |t|>) 
JKS 

Intercept 

0.704 

0.320 

1.044 

1.252 

9.7 

7.9 

0.030 

0.055 

0.102 

Personal 

0.177 

0.104 

1.087 

1.359 

3.1 

2.5 

0.103 

0.223 

0.313 

% white 

-0.950 

0.389 

1.051 

1.267 

4.3 

3.0 

0.023 

0.089 

0.150 

Wave_c 

-0.060 

0.025 

0.920 

0.988 

6.7 

5.9 

0.023 

0.039 

0.052 

Wave_m 

0.067 

0.106 

0.991 

1.119 

13.7 

12.8 

0.531 

0.530 

0.580 


Note: the above analysis uses approximately inverse-average variance weights based on 
on RVE model using correlated effects weights with an assumed p = 0.80. The between 
study variation used for these weights was / = 0.037. The model is based on m=28 
studies with at total of N= 300 effect sizes. 
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Figure 1 : Boxplot comparison of Type-I error rates 
of six variance estimators and eight variable types 
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Note: The first letters signify the adjustment, while the last letter signifies the degrees of freedom (H=m-2, 
S =Satterthwaite): within each estimator, the variables from left to right follow those found in Table 3: 
for those with df=S, only values corresponding to df>4 are shown. 
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Type I Error 


Figure 2: Boxplot comparison of Type-1 error rates 
of six variance estimators and meta-regression model with four variable 
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Note: The first letters signify the adjustment, while the last letter signifies the degrees of freedom (H=m-p, S=Satterthwaite): 
within each estimator, the variables from left to right follow those found in Table 5; the results look across all three models and 

two values of k&n with m=20. 
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Power to reject null hypothesis 


Figure 3: Boxplot comparing the power of t-tests using 
MBBS and JKS adjustments 



< 1 = 01 02 04 01 02 04 01 02 04 01 02 04 01 02 04 01 02 04 01 02 04 01 02 04 

Note: The bars indicate the power of the JKS test for each variable and effect size combination, across all 
parameter values studied. Only tests with Satterthwaite df >4 are shown. 
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